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Abstract 


In  this  paper  we  provide  a  review  of  the  available 
methods  for  estimating  the  standard  error  of  M-  and 
(^-estimates  in  regression.  In  the  case  of  M-estimates, 
we  show  how  to  use  MINITAB  to  compute  these  estimates  along 
with  estimates  of  their  standard  errors. 


Key  words:  M-estimate,  t^-estimate,  robust  regression,  standard 
error,  bootstrap. 
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1.  Introduction 

^  Over  the  last  two  decades  there  has  been  much  interest 
in  the  statistical  literature  in  alternative  methods  to  least 
squares  for  fitting  equations  to  data.  During  this  time  a  large 
number  of  estimates  of  regression  coefficients  have  been  proposed 
that  are  not  unduly  affected  by  a  small  percentage  of  the  data 
(so-called  robust  estimates).  Although  the  robustness  properties  of 
these  estimates  have  been  studied  in  great  detail,  little  attention 
has  been  paid  to  the  problem  of  estimating  the  asymptotic  covariance 
matrices  of  these  estimates.  Such  estimates  are  necessary  if 
inferences  are  to  be  made  about  the  unknown  regression  parameters. 

— lir  this  paper  provide  .a  brief  description  of  two  popular 

robust  regression  estimates,  namely  M-  and  l j -estimates .  review; 
the  available  methods  for  estimating  the  asymptotic  covariance 

f 

matrices  of  each  of  these  estimates.  In  the  case  of  M-estimates,  ~we 
show,  ^how  to  use  MINITAB  to  compute  the  robust  estimates  along  with  an 
estimate  of  their  asymptotic  covariance  matrix.  Finally,  the 
different  robust  estimates  and  their  estimated  covariance  matrices  are 
compared  via  an  example.  _ 

2.  Methods  of  estimating  the  asymptotic  covariance  matrices  of 
robust  regression  estimates . 

Consider  the  linear  regression  model  specified  by 
Y  =  Xp  +  t 
T 

where  Y  =  ^  Y  ^ , Y  g  t • • * i Y  n ) ,  X  is  an  n*(p+l)  full  rank  matrix 

T 

of  known  constants,  ft  =  ( ,  p ,,...,  p  )  a  vector  of  unknown 


T 

parameters,  and  £  =  (ci»*2**'*,cn)  a  vector  i*i*d.  random  errors 

from  a  distribution  with  median  0  and  density  f. 

2.1.  M-estiaates. 

Throughout  this  section  we  shall  assume  that  the  distribution 
of  the  random  errors  is  symmetric. 

An  M-estimate  is  defined  as  the  solution  p^  of  the  vector  of 

estimating  equations 

Z  lUj.r.lXj  :  0  (1) 

i  =  l  *x  1  *l  ' 

T  T 

where  r^  -  y^  -  x^  p  and  x^  is  the  ith  row  of  X.  We  will 
consider  only  n(x,r)  functions  of  the  form 

tj(x.r)  =  v(x)^c(^TjT)  (2) 

where  a  is  a  scale  factor  that  may  be  estimated  from  the  data, 
v(x)  is  a  nonnegative  weight  function,  and 

ft  if  | t | <c 

vcm  = 

csign(  t)  if  1 1 1  > c 

which  is  known  as  Huber’s  V  function.  This  form  of  n(x,r),  for 

the  special  case  with  v(x^)  =  1  -h ^ the  leverage  of  x-  ,  was 

proposed  in  Handschin,  Schweppe,  Kohlas  and  Fiechter  (1975)  and  is 
referred  to  as  Schweppe’s  form.  It  is  discussed  by  Hampel  (1978, 
Section  6)  where  he  says  that  this  is  the  most  intuitive  way  to 
bound  influence  in  both  the  residual  and  design  space.  Huber 
(1981,  Section  7.9)  recommends  this  choice  for  v<  and  again,  in 
Huber  (1983,  Section  6),  he  reaffirms  this  recommendation.  If  we 


take  v(x)  ■  1,  we  get  Huber’s  M-estimate  which  has  unbounded 

influence  in  the  design  points.  If,  in  addition,  we  specify  a 
large  value  for  c,  the  resulting  estimate  is  essentially  least 
squares.  For  a  more  complete  description  of  M-estimates  see 
Hampel  et  al  (1986,  Chapter  6)  and  Hettmansperger  (1987). 

The  following  form  of  n(x,r)  can  be  used  in  a  weighted 

least  squares  algorithm  to  compute 

n(x,r)  =  w(x,r)r/o  (3) 


where 


w(x,r)  =  min( 1 ,cov(x)/ |r  1 1 . 


In  the  Appendix,  for  a  particular  choice  of  v(x),  we  give  the 


MINITAB  commands  to  compute  a  I-step  version  of  2^  using  weighted 

least  squares  to  solve  (1)  with  weights  given  by  (3). 

Maronna  and  Yohai  (1981)  show,  under  mild  regularity 
1/2  ' 

conditions,  that  n  (2^-2)  is  asymptotically  normal  with  mean  0 


and  covariance  matrix  U  =  where 


M  =  E(  >i(x , r )  ]  =  E(i  ) xxT  1 


d  yc' cTsTT  - 


Q  :  E( n2( X, r )XXT1  =  E[w2(x,r)  ^  xxTl 

a1  " 


dv  ( t ) 

>c(t)  =  Tt 


1  if  | t | <c 
0  if  |t | >c. 


Obvious  estimates  of  M  and  Q  are  given  by 


A  1  n  r ;  t 

M  '  no  ^  *c*  ov(x  •"")  '  *i*i 


a 


iiivbv  niii  mini 


winniiWMuinuiHHniiimuii 


*  1  n  n  9  T 

«  =  jfi  »  <5t.rt)rizxixiT, 


respectively,  where  r^  =  ^  -  x^p^.  Thus  the  asymptotic 


covariance  matrix  of  0^,  s  n~^U  can  be  estimated  by 


VM  =  |n/(n-p-l ) }n  QM  \ 


The  bias  correction  factor  n/(n-p-l)  was  recommended  by  Huber 
(1981,  page  173)  and  is  there  to  recapture  the  classical  formula 
in  the  least  squares  case  (u(x)  ■  1  and  c  =  «°) .  If  we  let  d^  = 

Ti/lavlx; ))  and  w.  =  w(x. ,r; )  then  V  is  given  by 
1  ^ 1  1  ^11 


VM  =  Hrjrt  1  ,f,V  'OilSiSi1!  *‘t  ,*,»!  VM!1!  • 


2_  2„  ..  T, 


n 


To  implement  ^  the  user  has  to  decide  on  v(x),  a  and  c. 
Welsch  (1980)  proposed  the  following  choices  for  v(x)  and  a 


v(x)  =  ( 1-h: )/yH“ 

*  li 


o  =  S 


( i ) 


where  h{  is  the  leverage  of  defined  to  be  the  ith  diagonal 
1  *>1 


element  from  the  least  squares  projection  matrix  and  is 


the  root  mean  square  error  from  the  least  squares  fit  with  the 
ith  case  deleted.  These  choices  can  be  motivated  as  follows. 
First  notice  that  in  this  case 


d1  =  DFITSi  =  ti(hi/(l-hi 


1/2 


where  t^  is  the  ith  studentized  or  t-residual  and  is  given  by 


1/2 

t-  =  r ■ / { s ,■ . ( 1-h • )  ).  The  value  d;  is  also  a  measure  of  the 


l  '  i'  (  i  ) ;  *  "i . -  ~i 

standardized  change  in  the  least  squares  fit  when  the  ith  case  is 
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deleted.  It  is  therefore  an  important  diagnostic  quantity  for 
determining  cases  with  large  influence  on  the  least  squares  fit. 
Referring  back  to  (3),  we  see  that  as  long  as  |d^|  <  c  the  ith 
case  is  not  downweighted  in  the  robust  fit;  otherwise,  it  is 
downweighted  in  proportion  to  the  excess  of  |d^|  over  c.  For 
the  choice  of  c,  we  take 

c  =  2[ (p+1 )/n]1/2 

as  recommended  by  Belsley,  Kuh  and  Welsch  (1980,  page  28)  for 
diagnostic  purposes  in  conjunction  with  least  squares. 

In  the  Appendix,  we  present  the  MINITAB  commands  to  compute 
and  Vy  based  on  the  above  choices  of  v(x),  a  and  c.  Computing 

A  A 

Vy  using  MINITAB  can  be  made  easier  by  reexpressing  V.  First 

Q  m 

recall  that  2  a-x-x-1  =  X'AX  where  A  is  a  diagonal  matrix 
i=l 

with  diagonal  elements  a^,...,an>  Then  letting  =  diagonal 
Wc'  (di  ) ,  .  . .  ,  V-c '  (  cIh  ) }  and  D2  =  diagonal  {w12i~12  ,  .  .  .  .wn2rn2}  we  can 

A. 

reexpress  as 

\  (X'DjXr^X'DjXHX'DjX)-1. 

2.2.  I ^-estimates. 

The  i ^-estimate  ^  is  defined  as  the  value  of  £  that 
minimizes 

H  rr* 

.2|?i  -  *i  ?l- 

For  a  review  of  the  historical  development  of  l ^-estimates  the 
reader  is  referred  to  Bloomfield  and  Steiger  (1983). 

Bassett  and  Koenker  (1978)  show,  under  mild  regularity 


conditions,  that  -p)  is  asymptotically  normal  with 


2  - 1 

mean  0  and  covariance  matrix  U  =  t  (X'X)  where  r  =  1 / { 2 f ( 0 ) } 


The  problem  of  estimating  r  has  been,  extensively  and 
almost  exclusively,  studied  in  the  one-sample  location  model 
setting,  which  occurs  when  p  =  0  and  X  consists  of  a  column  of 
ones.  We  now  review  the  results  from  this  setting.  Let 


Y(l)  -  ^(2)  -  •••  -  denote  the  order  statistics  and  0  the 


•  V 


sample  median  of  Yj,  Yg, 

For  the  case  that  n  is  odd  (i.e.  n=2m+l),  Maritz  and 

Jarrett  (1978)  and  Efron  (1979)  independently  proposed  the 

2 

following  estimator  of  r 4 


n 


thje  =  n[i?1wiv(i)2  -  ,i|1wi¥<i) 


where 


i/n 


(m 


_  f  i.  /  11  _  __ 

^  s  um(l-u)mdu. 

!r  { i-U/n 


The  following  related  estimate  was  proposed  by  Sheather  (1986) 


*2  n 
rt  =  n(  Z  W 


*v2 


i  =  l 


i‘  (i) 


-  f  2  W’Y, . 


ih  1 [i) 


(4) 


where 


W*  =  J(±4^)/  Z 


k=  1 


and 


(u) 


— ^  um( 1-u >m  . 
1  m!  )£ 


Under  the  conditions  that  f  is  positive  and  continuous  in  a 
neighborhood  of  0  and  E[ log (  1  + 1 e j |  ]  <  »,  Babu  (1986)  has  shown 


A  2  2 

that  -*  r  almost  surely  as  n  -»  ». 


In  a  large  Monte  Carlo  study,  Sheather  and  McKean  (1987) 


compared  various  estimates  of  r  in  terms  of  their  ability  to 

*  *  2  "2 

studentize  0.  They  found  both  and  r*  performed  well 

with  little  difference  between  them. 

The  other  estimator  of  r  that  performed  well  in  the  Monte 
Carlo  study  of  Sheather  and  McKean  (1987)  was  first  proposed 
by  Siddiqui  (1960).  This  estimator  of  r  is  given  by 

Td  =  n(Y([n/2]+d)  '  Y( [n/2]-d+l)1/(4d) 

where  d  =  o(n).  Bloch  and  Gastwirth  (1968)  found  that  the  value 

of  d  that  minimized  the  first  order  term  in  the  mean  square  error 
4/5 

is  0(n  ).  In  another  Monte  Carlo  study,  McKean  and  Schrader 

(1984)  found  that  the  tests  resulting  from  studentizing  0  by 
r^/n1'  d  =  Ofn-1'3)  were  very  liberal.  Following  a  proposal 

by  Lehmann  (1963),  McKean  and  Schrader  (1984)  found  that 
d  =  0(n*^)  was  an  improvement  over  d  =  0(n^^).  Recently,  Hall 


and  Sheather  (1987)  have  developed  an  edgeworth  expansion  for  the 
studentized  version  of  0.  They  found  that  the  value  of  d  that 
minimizes  the  first  order  correction  term  in  this  expansion  is 
0(n  ).  Unfortunately,  the  constant  involved  depends  on  the 

underlying  density  in  a  complicated  manner,  making  it  difficult  to 
estimate  in  practice.  A  number  of  other  estimators  of  r  exist. 

For  a  review  of  such  estimators  in  the  one-sample  setting  see 
Sheather  (  1987 ) , 


We  now  return  to  the  more  general  problem  of  estimating  * 

T  A 

based  on  the  residuals  r-  =  y.  -  x,p»  .  Since  p  +  1  of  these 

l  1  ^  1  ■v  C  ^ 

residuals  will  be  exactly  zero,  McKean  and  Schrader  (1987),  following 


a  suggestion  by  Hill  and  Holland  (1977),  suggest  that  these  zero 
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residuals  be  eliminated  when  estimating  t.  Let  n*  =  n-p-1  and  r|j|  < 
r  *  2  j  -  •••  -  r*n*-p-l)  denote  the  ordered  remaining  residuals.  Then 
we  recommend  the  estimate  of  Sheather  (1986)  given  by  (4)  with  Y^j 
replaced  by  r|^j  and  n  replaced  by  n*.  The  resulting  estimate  of  the 

*  _  i 

asymptotic  covariance  matrix  of  p0  ,  V.  =  n  U  is  given  by 

'll  1 1 

V,  =  (n/ ( n-p-1 )}n-1T^(X'X)-1 
*1  s 

where  again  the  factor  n/(n-p-l)  acts  as  a  bias  correction. 

3.  Example. 

The  data  in  Table  1  are  taken  from  Simkin  (1978)  and  are  annual 
rates  of  growth  of  average  prices  in  the  main  cities  of  Free  China 
from  1940  to  1946.  In  this  example,  interest  is  clearly  in  the  rate 
of  change  of  the  growth  in  prices  which  corresponds  to  in  the 
model  below. 

Table  1 


Year ( x ) 

40 

41 

42 

43 

44 

45 

46 

Growth  of  prices(y) 

1.62 

1.63 

1.90 

2.64 

2.05 

2.13 

1.94 

-10 


The  standard  error  estimates  denoted  by  sef^jy)  and  se(0-,p  )  were 


U 


1 


obtained  by  taking  the  square  root  of  the  second  diagonal  element  of 
Yy  and  ,  respectively.  As  a  check  on  the  accuracy  of  these 


estimates,  we  also  calculated  estimates  of  the  standard  errors  of  p 

and  p:t  using  the  bootstrap.  A  description  of  the  bootstrap 
*'l 


1M 


algorithm  as  it  is  applied  to  residuals  in  the  regression  setting  can 
be  found  in  Efron  and  Tibshirani  (1986).  In  the  case  of 
£ j-estimates ,  the  bootstrap  algorithm  was  applied  to  the  five 
residuals  that  remained  after  the  two  that  were  identically  zero  were 
eliminated.  For  the  M-  and  £  ^ -estimates,  601  and  1000  repetitions  of 
the  bootstrap  algorithm  were  performed,  respectively.  The  standard 
error  estimates  seg<^iyl  and  sei£^  )  were  each  calculated  as  0.75 


times  the  interquartile  range  of  the  bootstrap  estimates  of  p^.  This 
function  of  the  interquartile  range  was  used  in  preference  to  the 
standard  deviation  because  both  histograms  of  bootstrap  estimates, 
although  normal  in  shape,  had  many  more  outliers  than  one  would 
expect  from  a  normal  distribution.  Note  for  both  the  M-  and 
£j-estimate  of  p ^  the  close  agreement  between  the  estimates  of 
standard  error  obtained  from  V  and  the  bootstrap.  For  the  purposes 
of  comparison  we  also  report  in  Table  2  the  least  squares  estimate 
of  anc*  estimated  standard  error.  The  efficiency  gain 

by  using  the  M-  or  £ ^-estimate  of  pj  instead  of  the  least  squares 


estimate  is  quite  striking. 


-11- 


Table  2 


^1M  =  ^*075 

=  °* 

102 

*lLs  =  °'075 

A  A 

se(01M)  =  0.033 

8e(^U1l 

=  0.049 

1  *  =  0.063 

se  ^1LS ) 

seB(^lM)  =  °‘033 

=  0.045 

Appendix 
Least  Squares: 

NAME  'Y',  'XI' .  'XP',  'SRI',  '  YHI'  ,  'TR',  'DF',  'HI' 

REGR  '  Y'  on  p  in  'XI',  ...,  'XP',  put  std  resids  in  'SRI',  fit. 
in  'YHI' ; 

TRESIDS  in  'TR'  ; 

DFITS  in  'DF'  ; 

HI  in  'HI'  . 

PRINT  'Y'  'XI'  ...  'XP'  'YHI'  'TR'  'DF'  'HI' 

PLOT  'TR'  vs  'YHI' 

Robust : 

NAME  'WEL'  'W'  'RESIDS'  ' SR2 '  ' YH2 ' 

LET  K1  =  2*SQRT( (p+lJ/n) ) 

LET  'WEL'  =  Kl/ABSO( 'DF' I 
RMIN  1  'WEL'  into  '  W" 

REGR  'Y'  on  p  in  'XI'  ...  ' XP'  ' SR2'  ' YH2' ; 

WEIGHTS  in  '  W"  ; 

RESIDS  in  'RESIDS' . 

PRINT  'Y'  '  YH2'  '  W" 

AVERAGE  'W' 


NAME  'DIFF'  'IND'  'WT' 

LET  ' DIFF'  =  K1  -  ABSOI ’ DF'  ) 

LET  ' IND'  =  . 5* ( SIGN! ' DIFF'  )♦!) 

REGR  '  Y'  on  p  in  'XI ’  ...  'XP'; 
WEIGHTS  in  ' IND' ; 

XPXINV  in  Ml. 


J(d. 


iX'  DjXr1 


LET  'WT' 


I'r  **2)*( '  RESIDS'  **2) 


REGR  ' Y'  on  p  in  ’XI'  ...  '  XP' ; 

WEIGHTS  in  '  WT'  ; 

XPXINV  M2. 


(  X '  D  2  X  )  ~  1 

INVERSE  M2  into  M3 
MILT  Ml  by  M3  into  M4 

MULT  M4  by  Ml  into  M5  M’^M'1 

LET  K2  =  n/ ( n-p- 1 ) 

MULT  K2  by  M5  into  M6 
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